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1  Introduction 


This  paper  considers  the  problem  of  separating  a  composite  signal  through  the  recovery  of  an  underlying 
sparse  coefficient  vector  by  using  the  Dantzig  selector  given  only  an  incomplete  set  of  noisy  linear  random 
projections.  That  is,  we  discuss  the  estimation  of  a  coefficient  vector  c  £  Cq  given  the  vector 


U  =  X(3  +  z,  (1) 

where  X  £  Rraxp  is  a  sensing  matrix  with  n  <  p,  z  is  a  collection  of  i.i.d.  ~  N(0,a2)  random  variables,  and 
the  unknown  signal  /3  £  admits  the  sparse  representation  j3  =  Be  for  a  known  overcomplete  dictionary 
B  £  Cpxq.  The  individual  signals  composed  to  form  /3  can  then  be  recovered  from  c  and  B.  Since  Equation  (1) 
is  underdetermined  yet  consistent,  it  presents  infinitely  many  candidate  signals  /?  and  coefficient  vectors  c. 

The  Dantzig  selector  was  introduced  in  [10]  as  a  method  for  estimating  a  sparse  parameter  /3  £  Rp 
satisfying  (1).  Discussions  on  the  Dantzig  selector,  including  comparisons  to  the  least  absolute  shrinkage 
and  selection  operator  (LASSO),  can  be  found  in  [4,  6,  10,  11,  17,  19,  25,  27].  Both  the  Dantzig  selector  and 
LASSO  aim  for  sparse  solutions,  but  whereas  LASSO  tries  to  match  the  image  of  candidate  vectors  close  to 
the  observations,  the  Dantzig  selector  aims  to  bound  the  predictor  of  the  residuals.  When  tuning  parameters 
in  LASSO  and  the  Dantzig  selector  model  are  set  properly,  the  LASSO  estimate  is  always  a  feasible  solution 
to  the  Dantzig  selector  minimization  problem,  although  it  may  not  be  an  optimal  solution.  Furthermore, 
when  the  corresponding  solutions  are  not  identical,  the  Dantzig  selector  solution  is  sparser  than  the  LASSO 
solution  in  terms  of  the  l\  norm  [20].  Recently,  the  Dantzig  selector  model  has  been  applied  for  gene  selection 
in  cancer  classification  [29]. 

Classical  compressive  sensing  theory  guarantees  the  recovery  of  a  sparse  signal  given  only  a  very  small 
number  of  linear  projections  under  certain  conditions  [3,  8,  9,  15].  However,  very  seldomly  is  a  naturally 
encountered  signal  perfectly  sparse  relative  to  a  single  basis.  Therefore,  a  number  of  works  have  considered 
the  recoverability  of  signals  that  are  sparse  relative  to  an  overcomplete  dictionary  that  is  formed  by  the 
concatenation  of  several  bases  or  Parseval  frames  [12,  14,  15,  18,  21,  26].  In  this  work,  we  propose  and  analyze 
a  Dantzig  selector  model  inspired  by  the  above  applications  of  overcomplete  dictionaries  in  compressive 
sensing,  and  develop  an  algorithm  for  finding  solutions  to  this  model. 

The  following  notation  will  be  used.  The  absolute  value  of  a  scalar  a  is  denoted  by  |a|,  and  the  number 
of  elements  in  a  set  T  is  denoted  by  |Tj.  The  smallest  integer  larger  than  the  real  number  a  is  denoted  by 
[a].  The  ith  element  of  a  vector  x  is  denoted  by  x(i),  and  the  ith  column  of  a  matrix  A  is  denoted  by  A,: . 
The  support  of  a  vector  x  is  given  by  supp(a;)  =  {i  :  x(i)  0}.  The  £\,  and  £ 2  vector  norms,  denoted  by 
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||  •  ||i,  and  ||  •  || 2  respectively,  are  defined  by 


IMIi  =  l*(*)l  >  Hxll2  =  (  Yl  i^wi 


for  any  vector  x  £  C".  For  matrices  A,  B  with  the  same  number  of  rows,  A  B  is  the  horizontal  concate¬ 


nation  of  A  and  B.  Similarly, 


is  the  vertical  concatenation  of  A  and  B,  provided  each  has  the  same 


number  of  columns.  The  conjugate  transpose  of  a  matrix  A  is  denoted  by  AT . 

The  rest  of  the  paper  is  organized  as  follows.  In  Section  2,  the  Dantzig  selector  model  incorporating 
overcomplete  dictionaries  is  introduced.  In  Section  3,  we  present  an  algorithm  used  to  find  solutions  to  the 
proposed  model.  Section  4  presents  several  numerical  experiments  demonstrating  the  appropriateness  of  the 
model  and  the  accuracy  of  the  results  produced  by  the  presented  algorithm.  In  simulations  using  real-valued 
matrices  in  the  overcomplete  dictionary,  we  compare  the  efficiency  and  accuracy  of  the  presented  method 
with  the  competing  Alternating  Direction  Method.  Additionally,  we  demonstrate  the  utility  of  the  proposed 
algorithm  in  several  experiments  by  applying  it  in  various  domain  applications  including  the  recovery  of 
complex-valued  coefficient  vectors,  the  removal  of  impulse  noise  from  smooth  signals,  and  the  separation  and 
classification  of  a  composition  of  handwritten  digits.  We  close  the  paper  with  some  remarks  and  possible 
future  directions. 


2  The  Dantzig  selector  model  incorporating  overcomplete  dictio¬ 
naries 

In  this  section,  we  present  a  Dantzig  selector  model  incorporating  overcomplete  dictionaries  that  can  be  used 
to  recover  an  unknown  signal  and  reliably  separate  overlapping  signals. 

Suppose  the  unknown  composite  signal  /?  is  measured  via  y  =  Xf3  +  z,  where  X  is  an  n  x  p  sensing  matrix 
and  2  models  sensor  noise,  and  suppose  an  overcomplete  dictionary  B  is  known  such  that  /?  =  Be  for  some 
sparse  c.  Although  (j  and  c  are  not  known,  it  is  reasonable  in  many  applications  to  know  or  suspect  the 
correct  dictionary  components.  For  example,  if  the  signals  of  interest  appear  to  be  sinusoids  with  occasional 
spikes  as  in  Figure  3,  one  should  use  a  dictionary  that  is  a  concatenation  of  a  discrete  Fourier  transform 
component  and  a  standard  Euclidean  basis  component.  In  the  following,  let  q  =  2p  and  assume  the  p  x  q 
dictionary  B  is  formed  by  a  horizontal  concatenation  of  a  pair  of  orthonormal  bases,  B  =  $  vj/  ,  and 

the  components  of  /3  admit  the  sparse  representations  /?$  =  and  /3$  =  ’Fc$,  with  /3  =  /?$  +  and 
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c  = 


.  More  succinctly, 


$  \j> 


Cc[> 

C\j> 


To  recover  c,  and  therefore  also  /3  and  the  components  /?$  and  /3^>,  from  the  observations  y ,  we  propose 
using  a  solution  to  the  Dantzig  selector  model  (see  [10])  with  an  overconrplete  dictionary 


CG  min  {Hell!  :  \\D~1BT XT  {XBc- y)^  <  6}  ,  (2) 

cGC2p 

where  the  diagonal  matrix  D  G  R.qxq  with  entries  djj  =  diagllKXB)^^}  normalizes  the  sensing- dictionary 
pair.  Although  Model  (2)  is  expressed  using  an  overcomplete  dictionary  with  two  representation  systems, 
one  could  generalize  the  model  to  accomodate  more  systems. 

If  the  elements  of  X  are  independent  and  identically  distributed  random  variables  from  a  Gaussian  or 
Bernoulli  distribution,  and  B  contains  elements  of  fixed,  nonrandom  bases,  then  D  is  invertible.  To  see  this, 
note  that  djj  =  0  if  and  only  if  ((XT)j,  Bj )  =  0  for  all  i  G  {1,  2 However,  since  a  random  sensing 
matrix  is  largely  incoherent  with,  yet  not  orthogonal  to  any  fixed  basis  [7,  16,  28],  it  follows  that  djj  ^  0 
for  each  j,  ensuring  D  is  invertible.  Employing  a  sensing  matrix  whose  entries  are  i.i.d.  random  variables 
sampled  from  a  Gaussian  or  Bernoulli  distribution,  paired  with  an  overcomplete  dictionary  formed  by  several 
bases  or  parseval  frames  has  the  added  benefit  of  giving  small  restricted  isometry  constants,  which  in  turn 
improves  the  probability  of  successful  recovery  of  the  coefficient  vector  via  l\  minimization.  More  on  these 
concepts,  now  standard  in  compressive  sensing  literature,  can  be  found  in  [2,  3,  9,  12,  14,  15,  16,  18,  21,  26]. 


3  A  proximity  operator  based  algorithm 

To  compute  the  Dantzig  selector,  we  characterize  a  solution  of  Model  (2)  using  the  fixed  point  of  a  system 
of  equations  involving  applications  of  the  proximity  operator  to  the  i \  norm.  In  this  section  we  describe  the 
system  of  equations  and  their  relationship  to  the  solution  of  Model  (2)  and  present  an  algorithm  with  an 
iterative  approach  for  finding  these  solutions. 

Let  A  =  D~1Bt Xt XB,  and  define  the  vector  7  =  D~x BT XT y  and  the  set  T  =  {c  :  ||c  —  <  5}. 

The  indicator  function  ijr  :  C2p  — »•  {0,oo}  is  defined  by 

{0,  if  u  G  T 
+00,  if  u  ^  T 
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and  the  proximity  operator  of  a  lower  semicontinuous  convex  function  /  with  parameter  A  ^  0  is  defined  by 


proxA/(ai)  =  argmin  <  —  ||u  -  x\\l  +  f(u) 
uec2p  l 


Then  Model  (2)  can  be  expressed  in  terms  of  the  indicator  function  as 


ce  min  {||c||i  +Lf{Ac)}. 

cSC2p 


If  c  is  a  solution  to  Model  (3),  then  for  any  a,  A  >  0  there  exists  a  vector  r  €  C2p  such  that 

c  =  proxi|jv|ji  and  r  =  (/  —  proxt^.)  (Ac  +  r) . 

Furthermore,  given  a  and  A,  if  c  and  r  satisfying  the  above  equations  exist,  then  c  is  a  solution  to  (3), 
and  therefore  also  to  (2).  Using  the  fixed-point  characterization  above,  the  ( k  +  l)th  iteration  of  the  prox¬ 
imity  operator  algorithm  to  find  the  solution  of  the  Dantzig  selector  model  incorporating  an  overcomplete 
dictionary  is 

I  ck+1  =  proxiMti  (ck  -  ±AT(2rk  -  rfe-1))  ,  ^ 

I  Tk+ 1  =  (J-  prox^)  (Ack+1  +  rk)  . 

If  A/a  <  1/||A|||,  the  sequence  {(cfc,rfc)}  converges.  The  proof  follows  those  in  [13,  22].  We  remark  that  the 
proximity  operators  appearing  in  Equation  (4)  can  be  efficiently  computed.  More  precisely,  for  any  positive 
number  A  and  any  vector  u  €  Cd, 

r  -| T 

Pr°xA|M|>)=  proxAM(/ui)  proxAM(u2)  •••  proxAM  (u2p)  > 


prox^(u)-  [prQXl{|._7i|S4}(Ul)  Proxt{|,_T2|^}(u2)  •••  prax^.^^l 


where  for  1  <  i  <  2p 


proxA|.|(uj)  =  max{|w.j|  -  A,  0}— A 

\ui\ 


—  rf% 

Pr°x<.{,._.H<4}(ui)  =  7»  +  max{|uj  -  7il.  <*} ,  *  _  \ 

%  l^n  fi\ 

Summarizing  the  above,  one  has  the  following  proximity  operator  based  algorithm  (POA)  for  approxi¬ 
mating  a  solution  to  Model  (2). 
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Algorithm  1  (POA) 

Initial  Parameters:  The  observations  y  are  known,  and  the  sensing  matrix  X  and  dictionary  B  are  used 
to  define  the  matrix  A ,  the  vector  7  and  the  set  T .  A  parameter  a  >  0  is  chosen. 

Initial  Step:  Initial  guess  r°  =  r_1  =  0,  c°  =  0,  f3°  =  0;  A  =  0.999a/|| A\\2. 

Main  Iterations:  Generate  the  sequence  {(ck,rk)  :  k  £  N}  via  the  iterative  scheme  (4)  until  a  stopping 
criterion  is  met. 

Post-processing:  Use  the  appropriate  post-processing  scheme  to  construct  the  Dantzig  estimator  c  from 
the  final  output  of  the  main  iterative  step,  and  obtain  an  approximate  separation  of  the  signal  components 
from  c  and  B. 


The  main  iteration  process  will  ideally  terminate  once  the  sequence  { (cfc ,  rfc)}  reaches  a  stationary  point. 
In  practice,  we  estimate  this  by  stopping  once  either  of  the  following  are  met: 

|  D~1BtXt  (XBck  -y) 


1. 


<  e,  for  some  0  <  e  <  1,  or 


max{||cfe||2, 1} 

2.  The  support  of  ck  is  stationary  for  ;y  iterations,  for  some  predetermined  positive  integer  77. 


As  noted  in  [10],  the  Dantzig  selector  tends  to  slightly  underestimate  solution  values  within  the  support 
of  the  true  solution.  Let  ( c 00,r00)  denote  the  final  product  of  the  Main  Iterations  step  in  POA,  and  define 
the  set  A  =  supp{c°°}.  Denote  by  c\  the  vector  whose  elements  are  chosen  from  c  with  indices  in  A,  and  by 
.Ba  the  submatrix  whose  columns  are  chosen  from  B  with  indices  in  A.  The  Dantzig  selector  is  estimated 
on  A  by  solving  the  least  squares  problem  ca  =  argmin{||XT  (XB\c  —  y)  || 2 }  and  setting  Cj  =  0  for  i  ^  A. 

The  main  contribution  to  the  computational  complexity  of  POA  is  the  ‘Main  Iterations’  procedure. 
Assume  the  matrices  A  and  AT  are  computed  at  the  beginning  of  the  algorithm,  then  recalled  for  use  in 
Equation  (4).  For  each  iteration  of  the  ‘Main  Iterations’  stage,  computing  (rfc,cfe)  via  Equation  (4)  requires 
0(q2)  multiplications,  and  determining  whether  the  stopping  criteria  have  been  met  contributes  an  additional 
0(q2),  where  q  is  the  length  of  the  coefficient  vector  c.  Therefore,  if  R  iterations  are  required  to  complete 
POA,  then  the  overall  complexity  of  the  algorithm  is  (D(Rq2). 


4  Numerical  Experiments 

In  this  section,  the  separation  of  noisy  undersampled  composite  signals  using  POA  to  solve  Model  (2)  is 
demonstrated.  All  codes  are  implemented  in  MATLAB  R2013b  on  a  workstation  with  an  Intel  i7-3630QM 
CPU  (2.40GHz)  and  16GB  RAM. 

Four  numerical  experiments  will  be  presented.  In  the  first  three  experiments,  composite  signals  are 
simulated,  then  POA  is  applied  to  the  noisy  incomplete  observations  y  =  XBc  +  z  to  recover  and  separate 
the  components.  The  entries  of  the  nxp  sensing  matrix  X  are  sampled  from  the  standard  normal  distribution, 
which  is  then  normalized  so  that  each  column  has  unit  1 2  norm.  The  noise  vector  2  has  i.i.d.  entries  sampled 
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from  the  normal  distribution  N( 0,  cr2),  with  cr  =  0.01  and  0.05  corresponding  to  1%  and  5%  noise  levels 
respectively.  To  measure  the  effeciency  and  accuracy  of  the  algorithm,  we  use  the  CPU  run  time  and  the 
relative  £ 2  error  of  the  recovery  of  the  separate  components.  For  each  experiment,  we  report  the  means  and 
standard  deviations  of  these  measurements  over  50  simulations  for  each  set  of  parameters. 

In  the  fourth  experiment,  real-world  data  taken  from  the  United  States  Postal  Service  handwritten  digits 
data  sets,  are  used  to  sample  the  composite  signal  and  to  train  the  overcomplete  dictionary.  Given  an 
undersampled  signal  y  =  X/3,  where  X  is  an  n  x  p  sensing  matrix  with  entries  taken  from  a  Bernoulli 
distribution  and  /3  is  the  composite  image,  POA  is  used  to  identify  the  digits  and  to  approximate  the 
individual  components  in  the  composite  image. 

To  demonstrate  the  utility  of  POA,  we  also  use  the  Alternating  Direction  Method  (ADM)  developed  in 
[5,  23,  24]  to  estimate  a  solution  to  Model  (2)  in  Experiment  1.  A  comparison  of  the  results  are  summarized 
in  Figure  1  and  Table  1.  POA  and  the  ADM  separated  the  components  of  a  composite  signal  with  similar 
accuracy,  however  POA  was  significantly  faster.  We  do  not  compare  POA  and  the  ADM  in  Experiments 
2  and  3  because  these  problem  domains  involve  complex-valued  dictionaries  and  coefficients,  but  the  ADM 
was  designed  to  solve  real-valued  convex  optimization  problems  only. 

The  effectiveness  of  the  postprocessing  scheme  is  illustrated  in  Figure  2.  Every  other  figure  and  table 
present  only  results  with  postprocessing.  In  the  following,  time  is  measured  in  seconds,  and  the  symbol ' 
over  a  parameter  indicates  its  recovered  value  via  POA  with  postprocessing.  For  notational  convenience,  the 
relative  £2  error  of  the  recovery  of  a  vector  x  is  denoted  by  E{x)  :=  ||ar  —  x||2/||a;||2. 

4.1  Experiment  1 

In  this  experiment,  we  recover  and  separate  a  composite  signal  that  has  a  sparse  representation  in  terms  of 
wavelets  and  discrete  cosine  transforms.  Let  (3  represent  the  composite  signal,  /3$  the  wavelet  component  and 

the  sinusoidal  component,  each  of  length  p.  The  signals  are  simulated  by  generating  a  sparse  coefficient 
vector  c  of  length  2 p  by  selecting  a  support  set  of  size  s  uniformly  at  random  and  sampling  c  on  the  support 
set  from  the  distribution  1V(100, 15).  The  signals  are  observed  via  y  =  XBc  +  z,  where  the  overcomplete 
dictionary  B  is  formed  by  concatenating  matrices  for  the  level  5  Haar  wavelet  decomposition  and  the  discrete 
cosine  transform.  The  simulations  are  performed  50  times  for  each  m  =  1, 2, . . . ,  10  and  each  a  =  0.01,  0.05, 
with  parameters  p  =  256 m  +  512,  n  =  p/4,  s  =  |~n/9] .  The  parameters  for  the  stopping  criteria  of  POA  are 
e  =  10~4  and  ij  =  20. 

Comparisons  of  the  mean  and  standard  deviation  of  CPU  time  and  relative  £2  recovery  error  of  each  of 
the  components  for  50  simulations  at  each  level  of  rn  and  a  using  both  POA  and  the  ADM  to  solve  Model  (2) 
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are  illustrated  in  Figure  1  and  Table  1. 


□ 


(a)  CPU  runtime,  with  a  =  0.01.  (b)  CPU  runtime,  with  a  =  0.05. 

Figure  1:  Comparisons  of  CPU  runtime  of  POA  and  ADM  from  Experiment  1.  The  mean  CPU 
runtime  of  the  50  simulations  of  each  algorithm  is  represented  by  the  points  along  the  curves, 
and  the  vertical  lines  represent  one  standard  deviation  from  the  means.  The  solid  line  with  * 
markers  plot  the  values  obtained  using  POA,  and  the  dotted  line  with  o  markers  plot  the  values 
obtained  using  the  ADM.  The  horizontal  axis  represents  the  parameter  m  determining  the  size 
of  the  system,  and  the  vertical  axis  is  measured  in  seconds. 


4.2  Experiment  2 

In  this  experiment,  we  recover  and  separate  a  composite  signal,  with  one  component  /?$  being  a  dirac  spike 
signal  with  random  sparse  locations  and  values,  and  the  other  component  (3^  being  a  sinusoidal  signal  with 
random  sparse  Fourier  coefficients.  The  parameters  used  in  this  experiment  are  p  =  256 m,  n  =  64m,  and 
s  =  \n/ 9]  for  m  =  1,  2, . . . ,  10,  and  stopping  criteria  parameters  e  =  10-4  and  tj  =  6  for  a  =  0.01  and  ?y  =  30 
for  cr  =  0.05. 

For  each  of  the  50  simulations  for  each  value  of  m  and  cr,  the  experiment  is  set  up  by  generating  a  vector 
c  of  length  2 p  by  selecting  a  support  set  S  of  size  s  uniformly  at  random,  then  sampling  c  on  S  with  i.i.d. 
entries  C(j)  =  A(j)(l  +  |a(j)|),  where  A (j)  is  1  or  —1  and  a(j)  ~  1V(0, 1).  The  signal  is  observed  via  the 
vector  y  =  XBc  +  z ,  with  the  dictionary  B  being  a  concatenation  of  the  identity  matrix  and  the  discrete 
Fourier  transform  matrix,  both  of  size  p  x  p. 

Figure  2  illustrates  the  accuracy  of  the  numerical  separation  by  comparing  the  numerically  recovered 
composite  signal  and  separated  components  (denoted  by  ‘o’)  against  the  exact  values  of  the  signal  and 
components  (denoted  by  “+’)  for  one  simulation  with  m  =  2  and  a  =  0.05.  Table  2  lists  the  mean  and 
standard  deviation  of  the  algorithm  run  time  and  the  relative  ii  recovery  error  of  each  signal  component 
over  the  50  simulations  for  each  m  and  cr.  □ 
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8.7136 
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1.4670 
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1.6065 
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7.6928 
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> 

O 

6 

8.0279 

1.4589 
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1.4392 
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7 
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7.9039 

1.4730 

CA 

’O 
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10 
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(xlO-2) 

(xKT3) 

1 

4.4187 

14.921 
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4.9101 

3.8963 

5.0425 
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7.1261 
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5.8845 
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4.1028 

8.0082 
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m 
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10 

3.9514 

6.2999 

3.9518 

6.2383 
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6.3100 

3.9677 

6.3825 

Table  1:  Comparisons  of  the  relative  £ 2  errors  of  the  recovered  components  from  Experiment  1 
using  POA  and  ADM.  The  means  and  standard  deviations  over  50  simulations  of  the  relative 
errors  are  given  for  the  recovery  of  each  component  in  the  original  signal,  and  for  each  parameter 
m  and  noise  level  a. 
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(xlO-3) 
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0.1076 
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2 

0.2615 
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O 
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II 
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5 
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> 
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O 
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1.1852 

3.4638 

2.3516 

3.4088 
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£ 

9 
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3.4874 

2.3352 

3.3968 

2.5498 

10 

15.4924 

1.4241 

3.4326 

2.3453 

3.4374 

1.4001 

Table  2:  The  means  and  standard  deviations  of  the  CPU  run  time  and  the  relative  £ 2  recovery 
error  of  each  component  for  50  simulations  of  Experiment  2  using  POA  for  each  parameter  m 
and  noise  level  a. 
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(a)  Recovery  of  j3$>  without  postprocessing. 


100  200  300  400  500 


(b)  Recovery  of  fig,  with  postprocessing. 


100  200  300  400  500 


(c)  Recovery  of  1Hc(/3»).  (d)  Recovery  of  3m(/3®). 

Figure  2:  The  separation  of  components  from  a  typical  simulation  of  Experiment  2  with  m  = 
2,  a  =  0.05.  The  plots  in  the  top  row  are  of  the  recovered  values  of  the  component  /3$  without 
and  with  postprocessing.  The  component  fig,  is  complex-valued,  so  the  recovery  of  its  real  and 
imaginary  parts  with  postprocessing  are  displayed  separately  in  the  second  row.  In  each  plot, 
the  exact  values  are  denoted  by  “+’  and  the  values  recovered  by  POA  are  denoted  by  ‘o’. 
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a  = 

Mean 

0.01 

Stcl  Dev 

a  = 

Mean 

0.05 

Std  Dev 
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2.9154 

5.7204xl0-2 

2.8984 
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Table  3:  The  means  and  standard  deviations  of  the  run  time  of  POA  and  the  relative  £ 2  recovery 
norm  of  the  composite  signal  and  each  component  over  50  simulations  of  Experiment  3  for  noise 
levels  a  =  0.01  and  0.05. 

4.3  Experiment  3 

In  this  experiment,  a  specified  composite  signal  is  separated  into  distinct  components.  The  composite  signal 
/3  is  formed  by  combining  the  discretized  sinusoidal  component  /3$  and  the  sparse  spike  component  ,  where 

(27 TX  \  /  7TX  \ 

— —  \  +  sin  J  1  for  x  €  {0, 1, ... ,  1023}, 

and  (3q,  is  formed  by  selecting  a  set  S  of  size  s  and  sampling  on  S  using  the  same  method  applied  to 
c  in  Experiment  2.  The  overcomplete  dictionary  B  is  the  concatenation  of  the  identity  and  the  discrete 
Fourier  transform  matrices,  each  of  size  p  x  p.  Note  that  in  this  experiment,  the  exact  coefficient  vector 
is  not  directly  specified,  so  the  signal  is  observed  according  to  y  —  X(3  +  z.  The  parameters  used  are 
p  =  1024,  n  =  512,  s  =  57,  with  stopping  criteria  e  =  10”6  and  77  =  6  for  a  =  0.01  and  rj  =  30  for  er  =  0.05. 

A  typical  plot  of  the  composite  signal,  the  individual  components,  their  recoveries  and  pointwise  recovery 
error  with  er  =  0.01  are  shown  in  Figure  3.  Table  3  displays  the  means  and  standard  deviations  of  the  run 
time  of  POA  and  relative  £2  recovery  error  of  the  signal  and  each  component  over  50  simulations  for  each 
noise  level.  □ 

4.4  Experiment  4 

In  this  experiment,  composite  signals  of  two  handwritten  digits  are  classified  and  separated  using  POA  and 
standard  principal  component  analysis  techniques.  The  data  are  taken  from  the  USPS  handwritten  digit 
data  sets,  obtained  from  [1],  The  data  set  contains  10  classes,  labeled  zero  through  nine,  of  1100  examples  of 
8-bit  grayscale  16  x  16  images.  Each  image  is  reshaped  as  256-column  vectors.  In  each  class,  998  examples  are 
used  as  the  training  set  and  the  remaining  102  examples  form  the  testing  set.  Denote  by  R[j]  the  collection 
of  vectors  forming  the  training  set,  and  by  T[j]  the  collection  of  vectors  forming  the  testing  set.  By  an  abuse 
of  notation,  also  denote  by  R[j]  the  matrix  whose  columns  come  from  the  jth  collection  of  training  vectors. 
The  composite  vector  f3  is  sampled  from  the  test  data  by  randomly  selecting  two  vectors  from  two  distinct 
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(c)  The  recovered  component  91e(/3$).  (d)  The  recovered  component  /3*. 

Figure  3:  The  separation  of  components  from  a  typical  simulation  of  Experiment  3  with  noise 
level  a  =  0.01.  Subplot  (a)  is  the  exact  composite  signal  0,  which  is  then  undersampled  via 
y  =  X0  +  z.  Subplot  (b)  is  the  pointwise  error  0  —  0,  where  0  is  the  recovered  signal.  Subplots 
(c)  and  (d)  are  the  individual  recovered  components  91e(/3$)  and  0y  separated  using  POA  and 
Model  (2). 
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test  sets,  Pi  €  Tjt  and  /32  €  1}2,  and  taking  f3  =  Pi  +  p2-  Consider  the  observation  y  =  X(3,  where  X  is  a 
random  Bernoulli  128  x  256  sensing  matrix.  The  overcomplete  dictionary  is  learned  from  the  labeled  training 
sets  by  concatenating  the  first  few  principal  components  of  each  matrix  R[j ].  Using  Model  (2)  and  POA, 
we  classify  the  two  digits  in  the  composition  /3,  and  recover  an  approximation  of  the  two  digits  using  the 
procedure  outlined  in  Algorithm  2. 


Algorithm  2  Composite  Handwritten  Digit  Classification  and  Separation 

(1) :  Input  the  observation  y ,  the  training  sets  i?[0], . . . ,  R[ 9],  and  a  positive  integer  k  <  256. 

(2) :  For  each  j,  compute  U\j\,  a  matrix  whose  columns  are  the  first  k  principal  components  of  the  matrix 
R[j].  The  overcomplete  dictionary  is  learned  from  the  training  data  via 

B=[U[  0]  U[  1]  •••  U[  9]]. 

(3) :  Apply  POA  to  y  to  produce  a  sparse  coefficient  vector  c  satisfying  Model  (2)  and  to  recover  the 
composite  vector  /3  =  Be. 

(4)  :  Classify  the  components  of  f3  as  the  indices  j\ ,  j2  yielding  the  smallest  two  values  of 

|(/256  -  Wb1*)/3|2- 

(5) :  Form  a  reduced  dictionary 

B  =  [U\ji]  Ufa]]  ■ 

(6) :  Apply  POA  again  to  y  using  B  to  obtain  a  new  coefficient  vector  c  =  [cT  CJ2]T  satisfying  Model  (2). 
The  two  unknown  components  of  /3  are  approximated  as 

Pji  =  U\ji\ch,  and  Pj2  =  U\j2\ch. 


Let  us  explain  the  motivation  behind  Algorithm  2.  Since  each  unknown  digit  in  the  composite  vector 
should  be  described  well  by  its  corresponding  principal  vectors,  it  follows  that  one  can  approximate  the 
composite  vector  by  (3  =  Be  for  a  sparse  coefficient  vector  c,  and  Model  (2)  is  appropriate  to  use  on  y. 
In  Step  (2),  the  first  k  principal  components  of  each  training  set  R[j]  is  computed  and  used  to  form  the 
overcomplete  dictionary.  One  possible  method  to  find  each  U[j]  is  to  use  the  singular  value  decomposition. 
If  UT,V*  is  a  singular  value  decomposition  of  the  i?[j],  then  U[j]  can  be  taken  as  the  first  k  columns  of  the 
collection  of  left  singular  vectors  U.  In  Step  (4),  the  recovered  composite  vector  $  is  projected  onto  the 
vector  spaces  spanned  by  the  first  k  principal  components  of  each  training  class,  and  the  components  are 
classified  according  to  the  residual  vectors  giving  the  smallest  i2  norm.  In  Steps  (5)  and  (6),  the  dictionary  is 
reduced  based  on  the  most  significant  principal  components  identified  in  Step  (4).  Reducing  the  dictionary 
results  in  a  cleaner  separation  since  only  coefficients  in  the  identified  classes  will  be  recovered. 

Algorithm  2  was  performed  1000  times  for  different  observations  y  with  k  =  30,  and  the  recovered  classi¬ 
fication  indices  were  compared  with  the  true  classification  of  the  two  components,  as  well  as  the  classification 
determined  using  the  smallest  residual  error  from  the  composite  image  /?.  In  95.4%  of  the  simulations,  the 
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classification  accuracy  of  (3  generated  by  POA  given  only  the  random  projections  y  matched  or  exceeded  the 
classification  accuracy  of  the  exact  image  /3.  Examples  of  the  separation  of  two  composite  images  sampled 
from  the  testing  data  sets  using  Algorithm  2  is  illustrated  in  Figure  4.  □ 


Figure  4:  Two  illustrations  of  the  approximation  of  individual  components  composite  images 
using  POA  in  Algorithm  2  with  an  overcomplete  dictionary  learned  from  training  examples  as 
in  Experiment  4.  In  each  collection,  reading  left  to  right,  the  top  row  is  the  exact  composite 
image  /?  followed  by  the  exact  components.  The  bottom  row  is  the  recovered  composite  image 
j3  followed  by  the  recovered  components  /3j1  and  $j2 . 


5  Conclusion 

The  strength  of  Model  (2)  and  POA  in  separating  noisy  undersampled  composite  signals  is  demonstrated 
through  the  numerical  experiments  in  the  previous  section.  Figures  2,  3  and  4  and  Tables  1,  2  and  3  clearly 
illustrate  components  of  a  composite  signal  are  separated  using  POA  with  a  high  degree  of  precision.  In 
particular,  Figure  2  demonstrates  the  method’s  ability  to  distinguish  mixed  signals  that  have  similar  dynamic 
range,  Figure  3  demonstrates  the  effectiveness  of  this  method  to  separate  a  smooth  signal  from  unwanted 
impulse  noise  and  Figure  4  demonstrates  POA  can  be  used  to  separate  overlaid  images  using  an  overcomplete 
dictionary  training  from  labeled  data. 

The  relative  £2  norm  error  of  the  recovery  of  each  of  the  components  remains  fairly  stables  as  m  increases 
for  each  <7,  as  shown  in  Tables  1  and  2.  This  suggests  that  the  method  is  not  greatly  affected  by  the  system 
size,  and  is  stable  and  reliable  in  practice. 

Comparisons  to  ADM  demonstrate  the  advantages  of  using  POA  to  separate  a  noisy  undersampled  com¬ 
posite  signal  and  estimate  solutions  of  Model  (2).  When  the  underlying  coefficient  vector  and  overcomplete 
dictionary  have  only  real-valued  elements,  the  two  methods  have  similar  accuracy  yet  POA  is  significantly 
faster  (Figure  1,  Table  1).  Moreover,  POA  can  be  readily  applied  to  models  involving  complex-valued  over¬ 
complete  dictionaries  and  coefficients,  but  the  ADM  used  for  comparison  was  designed  for  convex  optmization 
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in  real-valued  domains  only.  In  this  sense,  POA  applies  to  a  wider  class  of  problems. 


In  summary,  we  have  introduced  a  model  for  the  Dantzig  selector  incorporating  overconrplete  dictionaries 
that  can  be  used  to  separate  composite  signals.  Additionally,  we  have  proposed  POA,  an  iterative  algorithm 
to  estimate  solutions  to  Model  (2)  and  have  given  the  results  of  several  numerical  experiments  to  support  the 
strength  of  both  the  model  and  the  algorithm.  We  have  shown  through  the  numerical  experiments  that  POA 
is  preferable  over  the  competing  method  ADM,  since  POA  produces  results  with  similar  accuracy  but  with 
less  CPU  time  and  is  applicable  to  a  wider  class  of  problems.  Some  possible  future  applications  based  on 
the  foundation  of  the  separation  of  composite  signals  include  medical  imaging,  image  inpainting  and  feature 
extraction.  Moreover,  advances  in  finding  sparse  descriptions  of  noisy  composite  signals  can  be  applied  to 
improve  technologies  in  signal  compression  and  communications. 
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